Wikipedia talk:Large language models/Archive 4

This is an archive of past discussions about Wikipedia:Large language models. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

Creating drafts with AI before verification

I want your thoughts on the practice of creating a draft article with AI as a starting point before verifying all points in it. I see this as a potentially useful strategy for making a well flowing starting point to edit. Immanuelle ❤️💚💙 (talk to the cutest Wikipedian) 18:13, 3 April 2023 (UTC)

Hello, Immanuelle. This is an automated technique for writing an article the wrong way. The best practice is to identify several reliable, independent sources that devote significant coverage of the topic. AI tends to be indiscriminate about sources. Please read Wikipedia:Writing Wikipedia articles backward. Cullen328 (talk) 18:33, 3 April 2023 (UTC)

Aside from the issue identified by Cullen, which I completely agree with, there's the possibility that another editor might come across an abandoned draft and assume that it just needs to be copyedited before moving to mainspace. This is particularly concerning when an article contains fabricated facts and fabricated sources, since WP:AGF would lead an editor to assume that the content is legitimate and that the sources are simply difficult to find and access. –dlthewave ☎ 20:33, 3 April 2023 (UTC)

Personally, I feel that writing exercises with unvetted information aren't suitable for submission to any page on Wikipedia. I think the collaborative process is better served when the content being shared has undergone some degree of review by the editor with respect to accuracy and relevance. Otherwise it's indistinguishable from anything made up. isaacl (talk) 21:55, 3 April 2023 (UTC)

Noticeboard for AI generated things

AI generated articles show up a lot on ANI. I think it might be helpful to add a dedicated noticeboard for this stuff. IHaveAVest talk 02:01, 4 April 2023 (UTC)

Applicable TOS

The Terms of use for ChatGPT[1] say As between the parties and to the extent permitted by applicable law, you own all Input. Subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output. This means you can use Content for any purpose, including commercial purposes such as sale or publication, if you comply with these Terms.

In threads above there is a link to a Sharing & publication policy[2] with attribution requirements. It's not clear to me whether this in generally in force. I think it may be meant for invited research collaboration on products that aren't yet publically available. Sennalen (talk) 16:47, 7 April 2023 (UTC)

Outright falsification

This doesn't go far enough: "LLM-generated content can be biased, non-verifiable, may constitute original research, may libel living people, and may violate copyrights." LLMs also blatantly falsify both citations (creating plausible-looking cites to non-existent sources) and quotations (making up fake quotes from real sources). — SMcCandlish ☏ ¢ 😼 09:28, 13 April 2023 (UTC)

added something—Alalch E. 10:09, 13 April 2023 (UTC)

Attribution section

This section might be the part I have the most issues with. It uses "in-text attribution" incorrectly, and it requires the use of {{OpenAI}} for OpenAI LLMs, which blurs the line between OpenAI TOS and Wikipedia policy (and doesn't comply with OpenAI's ToS anyway). I also don't think we've reached a consensus on how to attribute, or resolved the outstanding issues involving inline attribution, whatever we go with. DFlhb (talk) 03:31, 1 April 2023 (UTC)

You are right about the lack of consensus. Since we have already discussed the issue at length, it would probably be best to do an RfC on this issue, as suggested previously by dlthewave. We need to clarify whether or in what cases the following are needed: attribution using a template (top or bottom of the page), in-line attribution, in-text attribution, and edit summaries. Maybe split it up into 2 RfCs: one for the type of attribution on the page and one for edit summaries. Phlsph7 (talk) 06:15, 1 April 2023 (UTC)

RfC yes, but held at WP:VPP, with a pre-RfC at WP:VPI first, because we've become too insular and we need outside input on this, rather than something only we (and WP:FRS) will see. I also think we don't need to bother with an RfC on an edit summary requirement, since that has a low chance of gaining consensus; we should come up with something else.

The question is: do we think it's wise to hold the VPI discussion now, or do we want to wait a little, to give the community time to get used to LLM misconduct and organically experiment with solutions? And secondly, before we go to VPI, would it be worthwhile for each of us here to post a short one/two sentence summary of our positions on attribution, so we clarify lines of agreement and disagreement? I'll admit I lost track, and don't really know where anyone stands. DFlhb (talk) 06:39, 1 April 2023 (UTC)

I'm not sure what is the best process here in terms of WP:VPP and WP:VPI. To simplify the process, it might be a good idea to get as many options as possible off the list so it's more likely that some kind of consensus is reached.

Attribution makes the use of LLMs transparent to readers and makes it easier to detect (and forestall) misuse but it also makes appropriate use more difficult. The decision should probably be based on arriving at some kind of balance between these points. I advocated the use of in-text attribution earlier but this may make appropriate use too difficult so, as far as I'm concerned, we could take it off the list.

Concerning edit summaries: the discussion you mentioned is about edit tags (like the tag Mobile edit), not edit summaries, and the consensus (or lack thereof) is disputed in the discussion. If we decide against in-line attribution, edit summaries would be important to track which contents were produced by LLMs since a general attribution template only indicates that they were used somewhere on the page.

Besides the type of attribution, we would also need to narrow down the conditions under which it (and edit summaries) is necessary. Some of the relevant options would be:

no attribution is required in any case
attribution is required for all changes to articles and drafts
attribution is required for all non-trivial changes to articles and drafts (however we want to define "non-trivial" here, maybe as non-minor changes?)
attribution is required for all changes to articles and drafts if the LLM produced new information (this would be particularly difficult to check)
- or alternatively: attribution is required for all changes to articles and drafts that add new claims (i.e. excluding copyedits and summaries of contents already present in the article)

I'm not sure about the best option myself but I would tend to require an attribution template at the bottom for non-trivial changes or for introducing new claims, together with an edit summary. Phlsph7 (talk) 08:55, 1 April 2023 (UTC)

I'd support requiring attribution in all cases; this ties the editor to their responsibility. If minor changes were made with AI, by definition, they could have been made without AI. Iseult^Δxparlez moi 17:17, 1 April 2023 (UTC)

To take an example: grammar checkers are examples of programs using language models, and these days I assume the better ones have been built with AI-training techniques. Having a blanket rule would essentially mean anyone using a grammar checker would be required to make a disclosure, even for one being applied automatically by your browser. I think any potential value of disclosures is at risk of being lost in this situation. isaacl (talk) 17:55, 1 April 2023 (UTC)

I'm unclear on what additional value AI-guided grammar checkers have over older ones. It's also unclear which checkers, asides from Grammarly, use AI. My checker in Word 2012 certainly doesn't. In any case, I oppose giving AI latitude as a rule; however, given that checkers tend to require individual checkoffs by the user, which reduces it to the role of drawing attention to potential errors or improvements, this doesn't fall under the greater concern of AI-generated content. Iseult^Δxparlez moi 18:41, 1 April 2023 (UTC)

I think we should use the term "disclosure" in any broader conversation. Sources are attributed using citations, but I believe there is agreement that for now the output of these types of programs is being treated as just generated text that must be cited as necessary to suitable sources.

I think key points to discuss are at what point does disclosure become desirable, and for whom is the disclosure targeted? I think neither editors nor readers are interested in the use of grammar checkers to be disclosed. Readers and editors may be interested when substantial portions of the text have come from a program, as they might feel dubious about the amount of editorial oversight. Editors might be concerned as well with copy editing, in order to verify fidelity of the changes (though that's not a concern limited to copyediting done by programs). Personally I think the relevant consideration is more about how a writing tool is used, rather than the nature of the writing tools. Unfortunately, that's a fairly nebulous thing for anyone other than the author of the change to label, as I suspect numerical thresholds such as a changed-word percentage is going to have a large zone of uncertainty. isaacl (talk) 17:46, 1 April 2023 (UTC)

Phlsph7, isaacl, that isn't what I meant by "short". We're gonna keep going around in circles if we keep veering into minutiæ. We don't need to care about grammar checkers or search engines, because people will use common sense to interpret this policy. DFlhb (talk) 17:59, 1 April 2023 (UTC)

I wasn't summarizing my position on disclosure. I was raising what I think should be considered in planning a broader discussion on disclosure. I feel we need to think about the considerations that editors will raise, to try to figure out a concise question, or set of options to initially provide. isaacl (talk) 18:41, 1 April 2023 (UTC)

I was not under the impression that I was going in circles but I apologize if I was. My position is explained in the last sentence of my previous post. The other sentences concern the different options to be presented at the RfC(s). One RfC could be about the conditions under which attribution is required. If we present the options as a continuum we increase the chances that some kind of consensus is reached. The continuum could be something like the following: attribution for LLM-assisted edits is required...

for all changes, including usage on talk pages
for all changes to articles and drafts
for all non-trivial/non-minor changes to articles and drafts
for all changes that add substantial material with new claims to articles and drafts
for no changes

Are there any other relevant options to be added? An alternative would be to just ask an open-ended question like "Under what circumstances is attribution for LLM-assisted edits required?" and see what everyone comes up with. The danger of this approach is that it may be much harder to reach a consensus. This would be better for brainstorming than for reaching a consensus.

Once we have this issue pinned down, we could have a 2nd RfC about what type of attribution and/or edit summaries are required. Phlsph7 (talk) 18:44, 1 April 2023 (UTC)

I am not sure what to make of this, but it's worth noting that of the two current uses of this template, one is crediting "[Bing] by OpenAI" - the Bing chat tool does not seem to have the same kind of TOS as the OpenAI one does. Andrew Gray (talk) 14:59, 1 April 2023 (UTC)

My opinion is that in the interest of transparency and accountability, AI-generated content should be disclosed to the reader where they'll see it. This means using some sort of in-text notice or template at the section level. AI-assisted minor edits (copyediting etc) can be disclosed in an edit summary. This should be a Wikipedia policy that stands on its own regardless of the LLM's requirements.

This seems to be the direction that publications are leaning toward: Medium, Science, Robert J Gates and NIEHS have good explanations of why they suggest or require disclosure of AI use. –dlthewave ☎ 16:19, 1 April 2023 (UTC)

Template at the section level isn't appropriate because templates aren't meant to be permanent features of articles. They are used for things that are meant to be fixed and changed to allow for the template to be removed in the future. In this case, the usage for such text would be intended to be a permanent addition. Instead of a template, either a note at the top of the references list (such as an [a] note) that states the text was partially made with an LLM (similar to what we do when text is used wholesale from public domain works) or just an icon tag at the top right of the article, such as what we use for protection tags, would suffice. Silver seren^C 17:59, 1 April 2023 (UTC)

If we are going to attribute content added with the assistance of LLM, there are two things to keep in mind:

the requirements of the Terms of Use of the LLM provider, and
the requirements of the Terms of Use of Wikimedia.

As there are legal implications to both of these, that means that regardless of our discussions here or what anyone's opinion or preference here is, the ultimate output of any policy designed here, must be based on those two pillars, and include them. Put another way: no amount of agreement or thousand-to-one consensus here, or in any forum in Wikipedia, can exclude or override any part of the Terms of Use of either party, period. We may make attribution requirements stricter, but not more lax than the ToU lays out. (In particular, neither WP:IAR nor WP:CONS can override ToU, which has legal implications.)

Complying with Terms of Use at both ends

I'm more familiar with Wikimedia's ToU than ChatGPT's (which is nevertheless quite easy to understood). The page WP:CWW interprets the ToU for English Wikipedia users; it is based on Wikimedia's wmf:Terms of use, section 7. Licensing of Content, sub-sections b) Attribution, and c) Importing text. There's some legalese, but it's not that hard to understand, and amounts to this: the attribution must state the source of the content, and must 1) link to it, and 2) be present in the edit summary. The WP:CWW page interpretation offers some suggested boilerplate attribution (e.g., Content in this edit was copied from [[FOO]]; see that article's history for attribution.) for sister projects, and for outside content with compatible licenses. (One upshot of this, is that *if* LLM attribution becomes necessary, suggestions such as one I've seen on the project page to use an article-bottom template, will not fly.)

Absent any update to the WMF ToU regarding LLM content, we are restricted only by the LLM ToU, at the moment. The flip side of this, is that one has to suspect or assume that WMF is currently considering LLM usage and attribution, and if and when they update the ToU, the section in any proposed new LLM policy may have to be rewritten. The best approach for an attribution section now in my opinion, is to keep it as short as possible, so it may be amended easily if and when WMF updates its ToU for LLMs. In my view, the attribution section of our proposed policy should be short and inclusive, without adding other frills for now, something like this:

Any content added to Wikipedia based wholly or in part on LLM output must comply with:

the Terms of Use of the LLM provider, and
Wikipedia's attribution policy for copied content, in particular, an attribution statement in the edit summary containing a link to the LLM provider's Terms of Use.

Once WMF addresses LLMs, we could modify this to be more specific. (I'll go ask them and find out, and link back here.)

We may also need to expand and modify it, for each flavor of LLM. Chat GPT's sharing/publication policy is quite easy to read and understand. There are four bullets, and some suggested, "stock language". I'd like to address this later, after having a chat with WMF.

Note that it's perfectly possible that WMF may decide that attribution to non-human agents is not needed, in which case we will be bound only by the LLM's ToU; but in that case, I'd advocate for stricter standards on our side; however, it's hard to discuss that productively until we know what WMF's intentions are. (If I had to guess, I would bet that there are discussions or debates going on right now at WMF legal about the meaning of "creative content", which is a key concept underlying the current ToU, and if they decide to punt on any new ToU, they will just be pushing the decision about what constitutes "creative content" downstream onto the 'Pedias, which would be disastrous, imho; but I'm predicting they won't do that.) I'll report back if I find anything out. Mathglot (talk) 03:51, 10 April 2023 (UTC)

Wikipedia:Copying within Wikipedia is about copying content from one Wikipedia page to another, and so doesn't apply with respect to how editors incorporate content derived from external sources. The Licensing of Content section in the terms of use does have a subsection c, "Importing text", which states: ...you warrant that the text is available under terms that are compatible with the CC BY-SA 3.0 license (or, as explained above, another license when exceptionally required by the Project edition or feature)("CC BY-SA"). ... You agree that, if you import text under a CC BY-SA license that requires attribution, you must credit the author(s) in a reasonable fashion. It gives attribution in the edit summary as an example for copying within Wikimedia projects, but doesn't prescribe this as the only reasonable fashion. Specifically regarding OpenAI, though, based on its terms of use, it assigns all rights to the user. So even if the U.S. courts one day ruled that a program could hold authorship rights, attribution from a copyright perspective is not required. OpenAI's sharing and publication policy, though, requires that The role of AI in formulating the content is clearly disclosed in a way that no reader could possibly miss, and that a typical reader would find sufficiently easy to understand.

Wikipedia terms of use section 7, subsection c further states The attribution requirements are sometimes too intrusive for particular circumstances (regardless of the license), and there may be instances where the Wikimedia community decides that imported text cannot be used for that reason. In a similar manner, it may be the case that the community decides that enabling editors to satisfy the disclosure requirement of the OpenAI sharing and publication policy is too intrusive. isaacl (talk) 04:52, 10 April 2023 (UTC)

Yes indeed; CWW is only about copying content from one Wikipedia page to another (or among any Wikimedia property, or other compatibly licensed project), because it an interpretation of the ToU for en-wiki. The point I was trying to make, not very clearly perhaps, is the wording at CWW offers us a model of what we might want to say about LLM, as long as we take into consideration their ToU, as well as whatever ends up happening (if anything) with the wmf ToU (which, by the way, is scheduled for an update, and discussion is underway now and feedback is open until 27 April to anyone who wishes to contribute). Mathglot (talk) 08:27, 10 April 2023 (UTC)

You proposed following Wikipedia:Copying within Wikipedia for attribution, which in essence means extending it to cover more than its original purpose of ensuring compliance with Wikipedia's copyright licensing (CC BY-SA and GFDL). Personally I think it would be better to preserve the current scope of the copying within Wikipedia guideline, both to keep it simpler and to avoid conflating disclosure requirements with copyright licensing issues. isaacl (talk) 16:12, 10 April 2023 (UTC)

I like your proposed phrasing. True what we need the WMF's input; they published meta:Wikilegal/Copyright Analysis of ChatGPT a few weeks back, but it seems non-committal. As for the ToS of other LLM providers, Bing Chat only allows use for "personal, non-commercial purpose", so it's straightforwardly not compatible. DFlhb (talk) 10:00, 10 April 2023 (UTC)

Content about attribution was not good and I've removed it

DFlhb had removed as a part of their reverted trim, and now I've removed it again. This topic is covered in wmf:Terms of Use/en. No useful specific guidance was provided here. There's no agreement that a policy needs to require use of Template:OpenAI as it is not obviously compatible with OpenAI ToS requirements. Editors advocating to include specific guidance about requiring attribution on this page should get consensus for the concrete version of text about this that they are committed to and want to see it becoming Wikipedia policy. —Alalch E. 11:41, 14 April 2023 (UTC)

LLMs on Wikisource

It has been proposed to me on Wikisource that LLMs would be useful for predicting and proposing fixes to transcription errors. Is there a place to discuss how such a thing might technically be implemented? BD2412 T 23:00, 15 April 2023 (UTC)

@BD2412: Do you know about LangChain? It's by far the most serious platform for building apps from LLMs in an open non-proprietary way. Although the guy behind it is on Twitter, he and others are far more responsive on their Discord server. Good luck! Sandizer (talk) 13:16, 16 April 2023 (UTC)

Great, thanks! BD2412 T 13:27, 16 April 2023 (UTC)

It was me that had asked the OP in this thread, based on their OCR cleanup edits at English Wikisource. naturally any developed LLM OCR/scan error finder would of course have to be approved by the applicable community process before widespread use. ShakespeareFan00 (talk) 17:16, 16 April 2023 (UTC)

I believe that today's commercial OCR software does include language models for error correction, but while they are not "large" as in LLMs, I believe they are substantially larger than typical autocorrect systems. A very good correction system involving LangChain and Pywikibot should be possible to make from open ~7B size models (e.g. Dolly, see Ars Technica's summary) which run fairly fast on typically four ordinary server CPU cores. It should be possible for project communities to thoroughly test such at a sufficiently large scale to find any issues which might cause serious problems. I suspect that corrections can be automatically classified into those which should require human review, and those which most probably don't need it. Sandizer (talk) 17:29, 16 April 2023 (UTC)

I think we ought to focus more on specific uses of software rather than the implementation, because that's not always published by the provider and can change rapidly. If there is concern regarding using OCR programs, that should be addressed regardless of how the programs are implemented. isaacl (talk) 21:05, 16 April 2023 (UTC)

BTW I created an article for LangChain which has a couple good starter resources. Sandizer (talk) 04:36, 18 April 2023 (UTC)

Talk pages / Non-article content

The current text starts by granting the permission to use LLM text as a basis for discussion on Talk pages. But how is this ever going to be appropriate as part of the process of building an encyclopedia (as opposed to FORUM-esque discussion?). The prohibition was for using LLMs for 'arguing your case', but the problem I'm seeing is not just this, but people using them for random[3] contribitutions; and if they're using for closing RfCs/AfDs etc ... ? Arghh. I have tried to clarify. Bon courage (talk) 02:06, 7 April 2023 (UTC)

The idea was to allow people to quote from LLM outputs as part of discussions of how appropriate LLMs are for Wikipedia (i.e. here, or at ANI). Not at all to allow these kinds of junk contributions. Agree that it's so unclear that it's almost counterproductive. DFlhb (talk) 10:36, 8 April 2023 (UTC)

When it comes to what's between meta LLM discussions and junk comments, the change from you should not use LLMs to "argue your case for you" in talk page discussions to you must not use LLMs to write your comments was in my opinion a pretty significant change in meaning here. To be clear, if a less-than-confident English speaker has a good argument, an argument of their own construction, should they be allowed to use an LLM to work out the phrasing and essentially have it "write their comment"? Or do we just say that competence is required and that editors who can't phrase their own arguments should not be on talk pages to begin with? PopoDameron ⁠talk 10:46, 8 April 2023 (UTC)

Personally, I'd rather read poorly-written human comments than "LLM-assisted" comments where it's unclear how much "assistance" the LLM gave. DFlhb (talk) 11:28, 8 April 2023 (UTC)

Same. —Alalch E. 11:30, 8 April 2023 (UTC)

Yup, if they're not competent in English they won't be competent to assess whether the automatic content accurately represents their thought anyway. Bon courage (talk) 12:14, 8 April 2023 (UTC)

Good point. Probably for the best (not that we can truly enforce that, but just in terms of policy). PopoDameron ⁠talk 19:12, 8 April 2023 (UTC)

I would simply remove this entire line:

while you may include an LLM's raw outputs as examples in order to discuss them or to illustrate a point

I feel it's redundant and just opens the door to people throwing LLM-generated "arguments" into a conversation, then claiming it's to illustrate a point as a get-out-of-jail-free card. If someone does need to illustrate a point with generated text, a simple "Would someone object if I pasted an example here?" on the Talk page should be good enough to get a yes/no out of the participants. — The Hand That Feeds You:^Bite 12:47, 8 April 2023 (UTC)

Agree. With the current text ("you must not use LLMs to write your comments") the door is already open to use LLMs, so long as it was clear you weren't trying to pass the stuff off as your own comment. Bon courage (talk) 12:58, 8 April 2023 (UTC)

I did the trimming. —Alalch E. 17:58, 8 April 2023 (UTC)

What about using for grammar improvement for example I gave my text and asked the chatgpt to improve the grammar does it allowed?--Shrike (talk) 18:08, 8 April 2023 (UTC)

I personally wouldn't trust these systems to just improve the grammar, because they may change the entire meaning of the sentence. You'd be better off using a dedicated online grammer-checking tool. — The Hand That Feeds You:^Bite 18:32, 8 April 2023 (UTC)

I think I am as editor can see when the meaning is changed and when its not and I take full responsibility for the edit but you may have point that this should be allowed only to experienced editors Shrike (talk) 19:03, 8 April 2023 (UTC)

For this I will simply repeat what I said earlier: talk page comments generated via LLMs are a fundamentally different thing from presenting to our readers quasi-factual LLM statements (which can be objectively wrong) or LLM-assisted manipulations of data (which can contain objective errors, and thus be objectively wrong). That is the risk, not sentimental ideas about "real human communication." For instance, most of the AfD contributions linked above would be just as irrelevant if they were bespoke artisanal emanations from a real human being's soul, and others, Wikipedia:Articles for deletion/Uzair Aziz, are basically no different from the routine boilerplate "non-notable, delete" rationales posted by humans. The problem here is the pattern of disruptive edits to AfDs -- it seems like there may be some kind of personal dispute involved on the editor's part, based on their contribs -- not the way they were written.

Furthermore, this is simply unenforceable. "AI-checking tools" are not infallible and will quickly become outdated as models change, and there is no clean line between "bad" LLMs and "acceptable" "online grammer-checking tools" since many grammar-checking tools have already incorporated LLMs, including major companies like Microsoft. Gnomingstuff (talk) 16:32, 11 April 2023 (UTC)

Proposed merge

This proposal was made by a sockpuppet of a blocked user, and it doesn't look like it's going anywhere. —David Eppstein (talk) 01:39, 13 April 2023 (UTC)

The following discussion has been closed. Please do not modify it.

I'm proposing merging Wikipedia:Using neural network language models on Wikipedia into Wikipedia:Large language models. I think the sections of the former would fit well into WP:LLM, and it would ease in the process of creating a new guideline on ChatGPT. – CityUrbanism 🗩 🖉 18:48, 10 April 2023 (UTC)

Oppose: The policy article should be as straightforward as possible. We are looking for a minimum viable policy, and anything above that should go to separate essays. PopoDameron ⁠talk 19:06, 10 April 2023 (UTC)
Guidance on using tools is being deliberately separated from policy requirements. In addition to keeping policy as simple as possible, this also enables usage guidance to be updated more easily. A few more related comments are present at Wikipedia talk:Large language models/Archive 3 § Merger proposal. isaacl (talk) 20:41, 10 April 2023 (UTC)
Oppose - I think that "Wikipedia:Using neural network language models on Wikipedia" is the better-written page, with a clear set of guidelines that are separate from the advisory content below. If anything, we should keep that page and integrates key points from this one into it. –dlthewave ☎ 15:36, 11 April 2023 (UTC)

WP:MEATBOT and WP:BRFA

AI seems like a semiautomated (bot-like) tool to me, and I am missing the mention of already existing policies like WP:MEATBOT, or WP:BOTUSE on this page. I believe AI is a good thing to Wikipedia, I imagine AI to crawl through the stubs and expand the ones that are expandable. What I do not think is good for Wikipedia, if AI is used to generate thousands of stubs.Paradise Chronicle (talk) 05:55, 21 April 2023 (UTC)

There was a very prominent mention but various changes and copyedits made it not be mentioned explicitly. MEATBOT is piped in this paragraph: You must not use LLMs for unapproved bot-like editing, or anything even approaching bot-like editing. Using LLMs to assist high-speed editing in article space is always taken to fail the standards of responsible use, as it is impossible to rigorously scrutinize content for compliance with all applicable policies in such a scenario.—Alalch E. 11:37, 22 April 2023

This was asked at the Bot Noticeboard in December 2022. The response was "it depends", based on speed/scale of editing and whether it becomes disruptive. This opinion seems to differ from WP:BOTDEF which puts all bot-related activity (including lower-speed assisted editing) under the puriew of BAG. I think it would make sense to add an explicit requirement that all LLM use be approved by BAG –dlthewave ☎ 02:27, 23 April 2023 (UTC)

For the ones who don't explicitly admit they use LLM the whole "policy" is useless. I'd prefer that an editing pattern which appears to be fall under LLM to be the focus of the policy. If editors refuse to admit that they use AI, it will be similar like MEATBOT or MASSCREATE, which are also hardly applied and only to define who actually enters in the two policies is a problem. Paradise Chronicle (talk) 08:33, 23 April 2023 (UTC)

This just seems to be an extension of the argument in the previous section. — The Hand That Feeds You:^Bite 11:34, 23 April 2023 (UTC)

Re-reading this, I believe HandThatFeeds is correct and I therefore removed the section header.Paradise Chronicle (talk) 13:08, 24 April 2023 (UTC)

Barely-working proof of concept for automated verification

Yesterday I tried to make a script which uses an LLM to check whether an article's cited sources support its text. I'm still working on that, but it's quite a bit more of an undertaking than I first thought. In the mean time, this script will take the plain text of an article (with all references stripped), select a number of passages it thinks should have a source, use web search to try to find some, and in a small fraction of such cases, it does actually work. I'm leaving it here as a proof of concept for how such tasks might be approached. (@BD2412: it's much easier this way than using LangChain, although I'm not sure it's exactly better because I had to add special cases for e.g. string cleanup and un-clicktracking search results, for which I think LangChain has some under-the-hood support.) Anyway, here it is:

Python code to use Anthropic Claude and DuckDuckGo to try to verify article claims, with sample output

!pip install anthropic
import anthropic
 
from requests import get
from bs4 import BeautifulSoup
from urllib.parse import unquote
 
llm = anthropic.Client('[ api key from https://console.anthropic.com/account/keys ]')
 
def claude(prompt): # get a response from the Anthropic Claude v1.3 LLM
  return llm.completion(model='claude-v1.3', temperature=0.85,
    prompt=f'{anthropic.HUMAN_PROMPT} {prompt}{anthropic.AI_PROMPT}',
    max_tokens_to_sample=5000, stop_sequences=[anthropic.HUMAN_PROMPT]
    )['completion']
 
def wikiarticle(title): # multiline wiki article text string without references
  try:
    plaintext = list(get('https://en.wikipedia.org/w/api.php', params=
      {'action': 'query', 'format': 'json', 'titles': title,
       'prop': 'extracts', 'explaintext': True}
      ).json()['query']['pages'].values())[0]['extract']
  except:
    return '[Article not found; respond saying the article title is bad.]'
  if plaintext.strip() == '':
    return '[Article text is empty; respond saying the title is a redirect.]'
  return plaintext
 
def passagize(title): # get passages in need of verification and search queries
  atext = wikiarticle(title)
  aintro = atext.split('==')[0].strip()
  passages = claude('For every passage which should have a source citation'
    + ' in the following Wikipedia article, provide each of them on separate'
    + ' lines beginning with "### ", then the excerpt, then " @@@ ", and'
    + ' finally a web search query you would use to find a source to verify'
    + ' the exerpt. Select at least one passage for every three sentences:\n\n'
    + atext[:16000]).split('###') # truncate article to context window size
  pairs = []
  for p in passages:
    pair = p.strip().split('@@@')
    if len(pair) > 1:
      passage = pair[0].strip()
      query = pair[1].strip()
      if passage[0] == '"' and passage[-1] == '"':
        passage = passage[1:-1]
      if query[0] == '"' and query[-1] == '"': # fully quoted query not intended
        query = query[1:-1]
      pairs.append((passage, query))
  return pairs
 
def duckduckget(query): # web search: first page of DuckDuckGo is usually ~23 results
  page = get('http://duckduckgo.com/html/?q=' + query
    + ' -site:wikipedia.org', # ignore likely results from Wikipedia
    headers={'User-Agent': 'wikicheck v.0.0.1-prealpha'})
  if page.status_code != 200:
    print('  !!! DuckDuckGo refused:', page.status_code, page.reason)
    return []
  soup = BeautifulSoup(page.text, 'html.parser')
  ser = []; count = 1
  for title, link, snippet in zip(
      soup.find_all('a', class_='result__a'),
      soup.find_all('a', class_='result__url', href=True),
      soup.find_all('a', class_='result__snippet')):
    url = link['href']
    if url[:25] == '//duckduckgo.com/l/?uddg=': # click tracking
      goodurl = unquote(url[25:url.find('&rut=',len(url)-70)])
    else:
      goodurl = url
    ser.append((str(count), title.get_text(' ', strip=True), goodurl,
        snippet.get_text(' ', strip=True)))
    count += 1
  #print('  DDG returned', ser)
  return ser
 
def picksearchurls(statement, search, article): # select URLs to get from search results
  prompt = ('Which of the following web search results would you use to try' +
      ' to verify the statement «' + statement + '» from the Wikipedia' +
      ' article "' + article + '"? Pick at least two, and answer only with' +
      ' their result numbers separated by commas:\n\n')
  searchres = duckduckget(search)
  #print('  DDG returned', len(searchres), 'results')
  if len(searchres) < 1:
    return []
  for num, title, link, snippet in searchres:
    prompt += ('Result ' + num + ': page: "' + title + '"; URL: ' + link +
        ' ; snippet: "' + snippet + '"\n')
  numbers = claude(prompt).strip()
  #print('  Claude wants search results:', numbers)
  if len(numbers) > 0 and numbers[0].isnumeric():
    resnos = []
    for rn in numbers.split(','):
      if rn.strip().isnumeric():
        resnos.append(int(rn.strip()))
    urls = []
    for n in resnos:
      urls.append(searchres[n - 1][2])
    return urls
  else:
    return []
 
def trytoverify(statement, search, article): # 'search' is the suggested query string
  urls = picksearchurls(statement, search, article)
  if len(urls) < 1:
    print('  NO URLS for', search)
    return []
  #print('  URLs:', urls)
  retlist = []
  for url in urls:
    page = get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh;' +
        'Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0'}).text
    try:
      pagetext = BeautifulSoup(page).get_text(' ', strip=True) # EDIT: BUG FIXED; example output below is better
      #print('  fetching', url, 'returned', len(pagetext), 'characters')
    except:
      print('  fetching', url, 'failed')
      continue
    prompt = ('Is the statement «' + statement + '» from the Wikipedia' +
        ' article "' + article + '" verified by the following text from the' +
        ' source at ' + url + ' ? Answer either "YES: " followed by the excerpt' +
        ' which verifies the statement, or "NO." if this text does not verify' +
        ' the statement:\n\n' + pagetext[:16000]) # have to truncate again, this is bad because the verification might be at the end
                                                  # so, this needs to be done in chunks when it's long
    result = claude(prompt)
    #print('  for', url, 'Claude says:', result)
    if 'YES:' in result:
      retlist.append((url, result.split('YES:')[1].strip()))
    return retlist
 
def checkarticle(article): # main routine; call this on a non-redirect title
  pairs = passagize(article)
  for passage, query in pairs:
    print('Trying to verify «' + passage + '» using the query "' + query + '":')
    vs = trytoverify(passage, query, article)
    if len(vs) < 1:
      print(' NO verifications.')
    for url, excerpt in vs:
      print(' VERIFIED by', url, 'saying:', excerpt)

Example output from checkarticle("Tucker Carlson") -- NOTE: this example output suffered from a bug which has been fixed in the code above, and should be re-run:

Trying to verify «Carlson began his media career in the 1990s, writing for The Weekly Standard and other publications.» using the query "tucker carlson weekly standard":
 NO verifications.
Trying to verify «Carlson's father owned property in Nevada, Vermont, and islands in Maine and Nova Scotia.» using the query "dick carlson property":
  fetching https://soleburyhistory.org/program-list/honored-citizens/richard-f-carlson-2013/ failed
 NO verifications.
Trying to verify «In 1976, Carlson's parents divorced after the nine-year marriage reportedly "turned sour".» using the query "tucker carlson parents divorce":
 NO verifications.
Trying to verify «Carlson's mother left the family when he was six and moved to France.» using the query "tucker carlson mother leaves":
 NO verifications.
Trying to verify «Carlson was briefly enrolled at Collège du Léman, a boarding school in Switzerland, but said he was "kicked out".» using the query "tucker carlson college du leman":
  fetching https://abtc.ng/tucker-carlson-education-tucker-carlsons-high-school-colleges-qualifications-degrees/ failed
 NO verifications.
Trying to verify «He then worked as an opinion writer at the Arkansas Democrat-Gazette newspaper in Little Rock, Arkansas, before joining The Weekly Standard news magazine in 1995.» using the query "tucker carlson arkansas democrat gazette":
  fetching https://www.cjr.org/the_profile/tucker-carlson.php failed
 NO verifications.
Trying to verify «Carlson's 2003 interview with Britney Spears, wherein he asked if she opposed the ongoing Iraq War and she responded, "[W]e should just trust our president in every decision he makes", was featured in the 2004 film Fahrenheit 9/11, for which she won a Golden Raspberry Award for Worst Supporting Actress at the 25th Golden Raspberry Awards.» using the query "tucker carlson britney spears interview":
  fetching https://classic.esquire.com/article/2003/11/1/bending-spoons-with-britney-spears failed
  fetching https://www.cnn.com/2003/SHOWBIZ/Music/09/03/cnna.spears/ failed
 NO verifications.
Trying to verify «Carlson announced he was leaving the show roughly a year after it started on June 12, 2005, despite the Corporation for Public Broadcasting allocating money for another show season.» using the query "tucker carlson leaves pbs show":
 NO verifications.
Trying to verify «MSNBC (2005–2008) Tucker was canceled by the network on March 10, 2008, owing to low ratings; the final episode aired on March 14, 2008.» using the query "tucker carlson msnbc show cancellation":
  fetching https://www.msn.com/en-us/tv/other/is-this-the-end-of-tucker-carlson/ar-AA1ahALw failed
 NO verifications.
Trying to verify «He remained with the network as a senior campaign correspondent for the 2008 election.» using the query "tucker carlson msnbc senior campaign correspondent":
  fetching https://www.c-span.org/person/?41986/TuckerCarlson failed
 VERIFIED by https://www.reuters.com/article/industry-msnbc-dc-idUSN1147956320080311 saying: "He will remain with the network as senior campaign correspondent after the show goes off the air Friday."
Trying to verify «Carlson had cameo appearances as himself in the Season 1 episode "Hard Ball" of 30 Rock and in a Season 9 episode of The King of Queens.» using the query "tucker carlson 30 rock cameo":
  fetching https://www.imdb.com/title/tt0496424/characters/nm1227121 failed
  fetching https://www.tvguide.com/celebrities/tucker-carlson/credits/3000396156/ failed
  fetching https://www.britannica.com/biography/Tucker-Carlson failed
 NO verifications.
Trying to verify «Tucker Carlson Tonight aired at 7:00 p.m. each weeknight until January 9, 2017, when Carlson's show replaced Megyn Kelly at the 9:00 p.m. time slot after she left Fox News.» using the query "tucker carlson tonight replaces megyn kelly":
 VERIFIED by https://www.orlandosentinel.com/entertainment/tv-guy/os-fox-news-tucker-carlson-replaces-megyn-kelly-20170105-story.html saying: "Tucker Carlson Tonight" debuted at 7 p.m. in November.

Obviously it still has a long way to go to be genuinely useful, but I hope someone gets something from it. I'm going to keep trying for something that attempts to verify existing sources. Sandizer (talk) 12:20, 27 April 2023 (UTC)

That is borderline genius. ~ ONUnicorn^{(Talk|Contribs)}problem solving 13:12, 27 April 2023 (UTC)

Thank you; you're too kind. As the example output shows, Claude doesn't pick the greatest statements to verify, picks very few compared to the number asked, does a lousy job formulating search queries for most of them, doesn't pick the best search results for the verification (and doesn't have a PDF reader or paywall-jumper, sadly), and when it does report that a statement can be verified, it doesn't usually pick the best excerpt for proving the central claims -- often the source excerpts are unrelated or only superficially related to the statement they are supposed to verify. Whether those problems can be overcome by prompt engineering or algorithm changes remains to be seen, but I wouldn't get your hopes up without multiple times more than the 10-12 hours of work that's gone in to it so far. The code also truncates both the article and all the sources to fit inside the LLM context window. That can be fixed by chunking, but again at a substantial increase in code complexity.

So, I have abandoned that avenue in favor of my original goal to use the references in an article to try to verify the claims where they're cited. Please help by testing the Python code here to get article plain text with numbered refs and URLs -- no LLM API key needed. Sandizer (talk) 01:06, 29 April 2023 (UTC)

It sort of works!

Python code to verify an article's existing reference URLs with sample output

!pip install anthropic
import anthropic

from requests import get
from bs4 import BeautifulSoup as bs
from re import sub as resub, match as rematch, finditer

llm = anthropic.Client('[ api key from https://console.anthropic.com/account/keys ]')

def claude(prompt): # get a response from the Anthropic Claude v1.3 LLM
  return llm.completion(model='claude-v1.3', temperature=0.85, 
      prompt=f'{anthropic.HUMAN_PROMPT} {prompt}{anthropic.AI_PROMPT}',
      max_tokens_to_sample=1000, stop_sequences=[anthropic.HUMAN_PROMPT]
      )['completion']

def textarticlewithrefs(title):
  # get English Wikipedia article in plain text but with numbered references including link URLs
  resp = get('https://en.wikipedia.org/w/api.php?action=parse&format=json&page='
             + title).json()
  if 'error' in resp:
    raise FileNotFoundError(f"'{ title }': { resp['error']['info'] }")

  html = resp['parse']['text']['*'] # get parsed HTML
  
  if '<div class="redirectMsg"><p>Redirect to:</p>' in html: # recurse redirects
    return textarticlewithrefs(resub(r'.*<ul class="redirectText"><li><a'
        + ' href="/wiki/([^"]+)"[^\0]*, '\\1', html))

  cleantitle = resp['parse']['title'] # fixes urlencoding and unicode escapes
  try:
    [body, refs] = html.split('<ol class="references">')
    #body += refs[refs.find('\n</ol></div>')+12:] # move external links etc. up
  except:
    body = html; refs = ''
  
  b = resub(r'\n<style.*?<table [^\0]*?</table>\n', '\n', body) # rm boxes
  #print(b)
  b = resub(r'<p>', '\n<p>', b) # newlinees between paragraphs
  b = resub(r'(</table>)\n', '\\1 \n', b) # space after amboxes
  b = resub(r'(<span class="mw-headline" id="[^"]*">.+?)(</span>)',
               '\n\n\\1:\\2', b) # put colons after section headings
  b = resub(r'([^>])\n([^<])', '\\1 \\2', b) # merge non-paragraph break
  b = resub(r'<li>', '<li>* ', b) # list item bullets for beautifulsoup
  b = resub(r'(</[ou]l>)', '\\1\n\n<br/>', b) # blank line after lists
  b = resub(r'<img (.*\n)', '<br/>--Image: <img \\1\n<br/>\n', b) # captions
  b = resub(r'(\n.*<br/>--Image: .*\n\n<br/>\n)(\n<p>.*\n)',
            '\\2\n<br/>\n\\1', b) # put images after following paragraph 
  b = resub(r'(role="note" class="hatnote.*\n)', '\\1.\n<br/>\n', b) # see/main
  b = resub(r'<a rel="nofollow" class="external text" href="(http[^"]+)">(.+?)</a>',
            '\\2 [ \\1 ]', b) # extract external links as bracketed urls
  b = bs(b[b.find('\n<p>'):]).get_text(' ') # to text; lead starts with 1st <p>
  b = resub(r'\s*([?.!,):;])', '\\1', b) # various space cleanups
  b = resub(r'  *', ' ', resub(r'\( *', '(', b)) # rm double spaces and after (
  b = resub(r' *\n *', '\n', b) # rm spaces around newlines
  b = resub(r'[ \n](\[\d+])', '\\1', b) # rm spaces before inline refs
  b = resub(r' \[ edit \]\n', '\n', b).strip() # drop edit links
  b = resub(r'\n\n\n+', '\n\n', b) # rm vertical whitespace

  r = refs[:refs.find('\n</ol></div>')+1] # optimistic(?) end of reflist
  r = resub(r'<li id="cite_note.*?-(\d+)">[^\0]*?<span class=' # enumerate...
            + '"reference-text"[^>]*>\n*?([^\0]*?)</span>\n?</li>\n',
           '[\\1] \\2\n', r) # ...the references as numbered seperate lines
  r = resub(r'<a rel="nofollow" class="external text" href="(http[^"]+)">(.+?)</a>',
            '\\2 [ \\1 ]', r) # extract external links as bracketed urls
  r = bs(r).get_text(' ') # unHTMLify
  r = resub(r'\s([?.!,):;])', '\\1', r) # space cleanups again
  r = resub(r'  *', ' ', '\n' + r) # rm double spaces, add leading newline
  r = resub(r'\n\n+', '\n', r) # rm vertical whitespace
  r = resub(r'(\n\[\d+]) [*\n] ', '\\1 ', r) # multiple source ref tags
  r = resub(r'\n ', '\n     ', r) # indent multiple source ref tags

  refdict = {} # refnum as string -> (reftext, first url)
  for ref in r.split('\n'):
    if len(ref) > 0 and ref[0] == '[':
      rn = ref[1:ref.find(']')]
      reftext = ref[ref.find(']')+2:]
      if '[ http' in reftext:
        firsturl = reftext[reftext.find('[ http')+2:]
        firsturl = firsturl[:firsturl.find(' ]')]
        refdict[rn] = (reftext, firsturl)

  return cleantitle + '\n\n' + b + r, refdict

def verifyrefs(article): # Wikipedia article title
  atext, refs = textarticlewithrefs(article)
  title = atext.split('\n')[0]
  print('Trying to verify references in:', title)

  for par in atext.split('\n'):
    if par == 'References:' or rematch('\[\d+] [^[].+, par):
      continue # ignore references section of article
    for m in list(finditer(r'\[\d+]', par)):
      refnum = par[m.start()+1:m.end()-1]
      excerpt = par[:m.end()]
      if refnum in refs:
        [reftext, url] = refs[refnum]
        print(' checking ref [' + refnum + ']:', excerpt)
        print('  reference text:', reftext)
        try:
          page = get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; ' +
              'Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0'}).text
          pagetext = bs(page).get_text(' ', strip=True)
          print('  fetching', url, 'returned', len(pagetext), 'characters')
        except:
          print('  failed to fetch', url)
          continue
        prompt = ( 'Can the following excerpt from the Wikipedia article "' 
                  + title + '" be verified by its reference [' + refnum + ']?'
                  + '\n\nThe excerpt is: ' + excerpt + '\n\nAnswer either'
                  + ' "YES: " followed by the sentence of the source text'
                  + ' confirming the excerpt, or "NO: " followed by the reason'
                  + ' that it does not. The source text for reference [' 
                  + refnum + '] (' + reftext.strip() + ') is:\n\n' 
                  + pagetext[:10000]  ) # truncated source text, TODO: chunk
        print('  response:', resub(r'\s+', ' ', claude(prompt)).strip())
      else:
        print('  reference [' + refnum + '] has no URL')

Sample output from this random article: verifyrefs('Elise Konstantin-Hansen')

Trying to verify references in: Elise Konstantin-Hansen
checking ref [1]: Elise Konstantin-Hansen (1858–1946) was a Danish painter and ceramist. She developed her own naturalistic style, often painting sea birds, animals, plants and beach scenes.[1]
  reference text: "Elise Konstantin-Hansen" [ http://denstoredanske.dk/Dansk_Biografisk_Leksikon/Kunst_og_kultur/Billedkunst/Maler/Elise_Konstantin-Hansen ] (in Danish). Dansk Biografisk Leksikon. Retrieved 5 March 2016.
  fetching http://denstoredanske.dk/Dansk_Biografisk_Leksikon/Kunst_og_kultur/Billedkunst/Maler/Elise_Konstantin-Hansen returned 5838 characters
  response: YES: "Allerede 1882 var hun begyndt at udstille på Charlottenborg (Før Skoletid. En lille Pige som børster Støvler), og 1885 fik hun den Neuhausenske præmie for Drenge udenfor en Grønthandel. " The source text confirms that Elise Konstantin-Hansen began exhibiting at Charlottenborg in 1882 and won the Neuhausen Prize in 1885, as stated in the excerpt.
checking ref [2]: Elise Konstantin-Hansen (1858–1946) was a Danish painter and ceramist. She developed her own naturalistic style, often painting sea birds, animals, plants and beach scenes.[1][2]
  reference text: Munk, Jens Peter. "Elise Konstantin-Hansen (1858 - 1946)" [ http://www.kvinfo.dk/side/597/bio/512/origin/170/ ] (in Danish). Kvinfo. Retrieved 5 March 2016.
  failed to fetch http://www.kvinfo.dk/side/597/bio/512/origin/170/
checking ref [2]: Konstantin-Hansen was born in the Frederiksberg district of Copenhagen on 4 May 1858. She was the daughter of the Golden Age painter Carl Christian Constantin Hansen and his wife Magdelene Barbara Købke. She changed the spelling of her name to Konstantin-Hansen in 1908.[2]
  reference text: Munk, Jens Peter. "Elise Konstantin-Hansen (1858 - 1946)" [ http://www.kvinfo.dk/side/597/bio/512/origin/170/ ] (in Danish). Kvinfo. Retrieved 5 March 2016.
  failed to fetch http://www.kvinfo.dk/side/597/bio/512/origin/170/
checking ref [2]: She grew up in an artistic milieu. Several of her eight siblings were artistically talented, especially her sister Kristiane who became an embroiderer. Konstantin-Hansen herself also began embroidering while young, often developing her own designs. Later, after her father died in 1880, she became Thorvald Bindesbøll 's principal assistant.[2]
  reference text: Munk, Jens Peter. "Elise Konstantin-Hansen (1858 - 1946)" [ http://www.kvinfo.dk/side/597/bio/512/origin/170/ ] (in Danish). Kvinfo. Retrieved 5 March 2016.
  failed to fetch http://www.kvinfo.dk/side/597/bio/512/origin/170/
checking ref [3]: In addition to being introduced to painting by her father, she attended Vilhelm Kyhn 's painting school[3]
  reference text: "Anna Ancher & Co" [ https://web.archive.org/web/20171201030634/http://www.sophienholm.dk/side.asp?ID=704 ]. Sophienholm (in Danish). Archived from the original [ http://www.sophienholm.dk/side.asp?ID=704 ] on 1 December 2017. Retrieved 13 March 2017.
  fetching https://web.archive.org/web/20171201030634/http://www.sophienholm.dk/side.asp?ID=704 returned 2494 characters
  response: YES: "Landskabsmaleren Vilhelm Kyhn (1819-1903) drev i de sidste årtier af 1800-tallet Danmarks mest populære kunstskole for kvinder. Kvinder havde ikke adgang til Kunstakademiet på den tid, men over 75 kvinder med stærke kunstneriske ambitioner fik deres første undervisning hos Kyhn."
checking ref [1]: In addition to being introduced to painting by her father, she attended Vilhelm Kyhn 's painting school[3] and was instructed by Christen Dalsgaard and Laurits Tuxen before studying in Paris in 1886.[1]
  reference text: "Elise Konstantin-Hansen" [ http://denstoredanske.dk/Dansk_Biografisk_Leksikon/Kunst_og_kultur/Billedkunst/Maler/Elise_Konstantin-Hansen ] (in Danish). Dansk Biografisk Leksikon. Retrieved 5 March 2016.
  fetching http://denstoredanske.dk/Dansk_Biografisk_Leksikon/Kunst_og_kultur/Billedkunst/Maler/Elise_Konstantin-Hansen returned 5838 characters
  response: YES: 1886 studerede hun i Paris, The source text confirms that Elise Konstantin-Hansen studied in Paris in 1886, as stated in the excerpt.
checking ref [1]: In the early 1880s, she practised embroidery, adopting the Pompeian approach favoured by Thorvald Bindesbøll. She went on to work with ceramics at the Utterslev workshop with Bindesbøll and the Skovgaard brothers where she developed a freer, more personal style. Significantly influenced by Japanese art, she developed an ornamental approach to the seabirds, animals, plants and beach scenes she frequently painted. Some of her ceramic creations are among her most notable works, especially a plate depicting a starfish, a glazed relief with oyster catchers, and distinctive bowls with vultures.[1]
  reference text: "Elise Konstantin-Hansen" [ http://denstoredanske.dk/Dansk_Biografisk_Leksikon/Kunst_og_kultur/Billedkunst/Maler/Elise_Konstantin-Hansen ] (in Danish). Dansk Biografisk Leksikon. Retrieved 5 March 2016.
  fetching http://denstoredanske.dk/Dansk_Biografisk_Leksikon/Kunst_og_kultur/Billedkunst/Maler/Elise_Konstantin-Hansen returned 5838 characters
  response: NO: The source text does not confirm that exact excerpt. While it does mention Konstantin-Hansen's early practice of embroidery and work with ceramics with Bindesbøll and the Skovgaard brothers, and her ornamental style incorporating birds, animals and beach scenes, it does not specify that she developed a freer, more personal style significantly influenced by Japanese art in the early 1880s, nor does it mention her notable works including a plate depicting a starfish, a glazed relief with oyster catchers, and bowls with vultures.
checking ref [4]: Konstantin-Hansen painted her last major work in 1930, Svaneflok i våge, depicting a group of swans near Kolding Fjord.[4]
  reference text: "Elise Konstantin-Hansen" [ http://www.kolding-kunstforening.dk/udstilling/ekh ] (in Danish). Kolding Kunstforening. Retrieved 6 March 2016.
  fetching http://www.kolding-kunstforening.dk/udstilling/ekh returned 139 characters
  response: NO: The source text is in Danish, and does not contain an exact match for the excerpt in English. Without a full translation of the Danish text, the excerpt cannot be verified.
checking ref [5]: From 1882, Konstantin-Hansen exhibited her paintings at Charlottenborg where in 1885 she won the Neuhausen Prize for her Drenge udenfor en Grønthandel (Boy Outside a Greengrocer's). In 1893, she started instead to exhibit at Den Frie Udstilling where she continued to display her works until 1928. Her works were also exhibited abroad, in Paris (1889), and Berlin (1910–11). Konstantin-Hansen exhibited her work at the Palace of Fine Arts at the 1893 World's Columbian Exposition in Chicago, Illinois.[5]
  reference text: Nichols, K. L. "Women's Art at the World's Columbian Fair & Exposition, Chicago 1893" [ http://arcadiasystems.org/academia/cassatt10b.html#hansen ]. Retrieved 24 July 2018.
  fetching http://arcadiasystems.org/academia/cassatt10b.html#hansen returned 17459 characters
  response: NO: The source text does not mention Konstantin-Hansen exhibiting at the 1893 World's Columbian Exposition in Chicago. It only lists artworks exhibited by other Danish women artists at the Exposition.
checking ref [1]: From 1882, Konstantin-Hansen exhibited her paintings at Charlottenborg where in 1885 she won the Neuhausen Prize for her Drenge udenfor en Grønthandel (Boy Outside a Greengrocer's). In 1893, she started instead to exhibit at Den Frie Udstilling where she continued to display her works until 1928. Her works were also exhibited abroad, in Paris (1889), and Berlin (1910–11). Konstantin-Hansen exhibited her work at the Palace of Fine Arts at the 1893 World's Columbian Exposition in Chicago, Illinois.[5] In 1917, she also exhibited at the Danish Art Trade show (Dansk Kunsthandel) and at the 1920 Women's Retrospective Exhibition (Kvindelige kunstneres retrospektive udstilling).[1]
  reference text: "Elise Konstantin-Hansen" [ http://denstoredanske.dk/Dansk_Biografisk_Leksikon/Kunst_og_kultur/Billedkunst/Maler/Elise_Konstantin-Hansen ] (in Danish). Dansk Biografisk Leksikon. Retrieved 5 March 2016.
  fetching http://denstoredanske.dk/Dansk_Biografisk_Leksikon/Kunst_og_kultur/Billedkunst/Maler/Elise_Konstantin-Hansen returned 5838 characters
  response: NO: The excerpt mentions that Konstantin-Hansen exhibited at the World's Columbian Exposition in Chicago in 1893. However, the source text does not confirm this information. The source text does mention that she exhibited in Paris in 1889 and in Berlin in 1910-11, but does not include any information about her exhibiting in Chicago in 1893.
checking ref [4]: From 1882, Konstantin-Hansen exhibited her paintings at Charlottenborg where in 1885 she won the Neuhausen Prize for her Drenge udenfor en Grønthandel (Boy Outside a Greengrocer's). In 1893, she started instead to exhibit at Den Frie Udstilling where she continued to display her works until 1928. Her works were also exhibited abroad, in Paris (1889), and Berlin (1910–11). Konstantin-Hansen exhibited her work at the Palace of Fine Arts at the 1893 World's Columbian Exposition in Chicago, Illinois.[5] In 1917, she also exhibited at the Danish Art Trade show (Dansk Kunsthandel) and at the 1920 Women's Retrospective Exhibition (Kvindelige kunstneres retrospektive udstilling).[1] More recently, examples of her work were exhibited in 2013 at Kolding Kunstforening.[4]
  reference text: "Elise Konstantin-Hansen" [ http://www.kolding-kunstforening.dk/udstilling/ekh ] (in Danish). Kolding Kunstforening. Retrieved 6 March 2016.
  fetching http://www.kolding-kunstforening.dk/udstilling/ekh returned 139 characters
  response: YES: More recently, examples of her work were exhibited in 2013 at Kolding Kunstforening.[4]

There is still much to be done, e.g., chunking when the source text is too big for the context window, PDF text extraction, and when a reference number occurs more than once in the same paragraph, the subsequent excerpts should not include any of the text up to and including earlier occurrences. Also, some of the verification decisions are plainly wrong because it wasn't focused on the specific text before the reference. I will work on those things tomorrow. Sandizer (talk) 04:37, 29 April 2023 (UTC)

Very cool indeed. DFlhb (talk) 09:17, 29 April 2023 (UTC)

Very nice already, amazing idea.—Alalch E. 19:39, 29 April 2023 (UTC)

Pretty cool. Some thoughts: I toyed with something that lets editors include a quote for sentences (I spent a while including them as quotes in references, until people said this wasn't a good idea - but there might be other approaches) - this would give us a dataset to check against... you could also imagine an interface were you give people choices.

There's a bit of an academic field surrounding fact checking - obviously generative models change everything - so it might be as well to ignore it and just carry on... on the other hand this paper looks fun: https://aclanthology.org/2020.lrec-1.849.pdf and appears to include a bench mark - so we could run against that and check performance.

Also there's this tool, which we could play with... though that's a little more open-qa, but I imagine there might be something there. TALpedia 17:13, 3 May 2023 (UTC)

News update

Here are some news articles since the last update I posted. — The Transhumanist 10:26, 2 May 2023 (UTC)

ChatGPT is more empathetic than real doctors, experts say | Business Insider (May 2, 2023)
Chemists are teaching GPT-4 to do chemistry and control lab robots | New Scientist (April 29, 2023)
Exclusive: Behind EU lawmakers' challenge to rein in ChatGPT and generative AI | Reuters (April 28, 2023)
YouTube case at Supreme Court could shape protections for ChatGPT and AI | Reuters (April 24, 2023)
How prompt injection can hijack autonomous AI agents like Auto-GPT | VentureBeat (April 24, 2023)
Auto-GPT May Be The Strong AI Tool That Surpasses ChatGPT | Forbes (April 24, 2023)
Hyena could blow away GPT-4 and everything like it | ZDNET (April 20, 2023)
Meet AutoGPT, the ChatGPT version everyone’s talking about | Digit (April 19, 2023) See also: Auto-GPT
Generative AI Is Exploding. These Are The Most Important Trends You Need To Know | Forbes (April 11, 2023)

Major trim

I've trimmed this draft (see diff). My previous thread veered into a discussion about a single section, but I think we need to discuss this seriously. After a break, I'm coming back to this draft with fresh eyes, and it's bad.

Most of it is cruft. It's meandering and frequently repeats itself. It bans the use of LLMs to spam talk pages, but talk page spam is already against the rules. It says that LLM-assisted drafts must comply with policy or be rejected, but that's already true for all drafts. It says that editors can't do high-speed editing with LLMs, but they already can't do high-speed editing.

Keep in mind that policies are supposed to reflect existing consensus, not create new rules out of thin air. But this draft fails that metric. It recommends WP:G3, but admins have refused G3 nominations of LLM articles. There is no consensus for creating a new criterion, and therefore, for mentioning any CSD in this draft. There's no consensus that LLM-use should be reserved for experienced editors (as if experienced editors never misbehaved!). And a policy shouldn't idly muse about whether LLMs comply with CC BY-SA; the U.S. Copyright Office ruled that LLM outputs were not copyrighted, so our musings don't belong in policy. DFlhb (talk) 23:44, 21 March 2023 (UTC)

Forgot to note: I liked the "Relevant policies and associated risks", but it wasn't very policy-like, and I kept the good bits. This wasn't an indiscriminate trim. DFlhb (talk) 23:51, 21 March 2023 (UTC)

Any thoughts of how wikipedia should respond to emerging change if not to create policy ahead of time. Is there a concept of "guidance" or should this be downgraded to an essay? I guess you wait for a case and see what people decide them elevate that to a policy? Is it useful to connect up the different cases to make the formation of a policy easier. Another approach is to try to write which existing policies apply rather than create new policy. Talpedia (talk) 23:59, 21 March 2023 (UTC)

Well... it was kind of inevitable, because those who think the essential policy-related considerations are already covered by existing policies didn't come to this page from the village pump discussion, as they didn't see a need for a new policy. I followed anyway because I wanted to try help make any resulting guidance as concise as possible. I wrote an essay, "Address problems without creating new specialized rules", which is relevant: the more special cases introduced, the more complicated it becomes for anyone to understand the rules. We're better off not writing new rules when the existing ones sufficiently cover the scenarios in question. Even without new policy, we can provide guidance for new situations. Existing policy can be put into context for new scenarios, and we could, for example, guide people to using a hatnote to provide appropriate attribution of content generated by a program, similar to {{Sect1911}}). isaacl (talk) 01:43, 22 March 2023 (UTC)

I've reverted, I feel that this removed a lot of relevant guidance and some of these sections are currently under discussion if you'd like to help refien them. The "Responsible Use" section that you left is a good summary, but it doesn't replace the more in-depth guidance on the page. Regarding deletion, you're correct that LLM content isn't universally accepted for G3, but isn't it true that content that's "factually incorrect or relies on fabricated sources" is eligible for G3? –dlthewave ☎ 02:54, 22 March 2023 (UTC)

I recall seeing multiple admins (either at ANI or wherever) decline G3 even for completely unedited LLM output (because it's not a "blatant hoax", just has some misinformation), and request that the community create an LLM-specific CSD if they want one.

Regardless: I really hoped we could keep the trimmed version, and individually discuss adding things back, rather than individually discuss what to remove. I don't have "specific concerns". I took a few hours to go line by line, meticulously, and removed what I thought was pointless, and it turned out to be 90% of the page. It wasn't hollowed out. I even added something inspired by the early JPxG version, which to me encapsulates the meat of the policy: Never paste LLM outputs directly into Wikipedia. You must rigorously scrutinize all your LLM-assisted edits before hitting "Publish". The main goal of a policy isn't enforcement, it's prevention. And the clearer a policy is, the more people will comply. I'd bet than even with less guidance, my version would result in less rule-breaking and a lower cleanup burden, not a higher one. WP:LLM doesn't have the recognizability of BLP or V or NOR, so it can only make up for that by being limpid and concise, if we want high adherence. Besides, what is guidance doing in a policy? I'd strongly vote against adoption, as is. It's dead-on-arrival. DFlhb (talk) 03:25, 22 March 2023 (UTC)

To elaborate on that last part a tiny bit: my idea, post-trim, was to turn the mishmash of links in WP:LLM#See_also into a unified essay that would provide guidance on how to use LLMs well (for example, good prompts can make a world of difference), which could potentially be upgraded to a guideline later on. Seemed like the best outcome. DFlhb (talk) 03:32, 22 March 2023 (UTC)

Many of the removed passages were based on previous talk-page consensus. So removing all of them in one go without any further discussion is not a good idea.

@DFlhb: do you think that, by your own lights, your version does not have the same problems you see in the current version? To me, your argument seems to lead to the conclusion that we don't need a policy at all. Phlsph7 (talk) 08:23, 22 March 2023 (UTC)

We do need a policy, so that people know how to behave, and so admins have "policy ground" to stand on that they can uncontroversially enforce. The point of the trim was to increase the likelihood of adoption by removing the rules ~~we made up~~ that I expect to be unpopular among the wider community. DFlhb (talk) 11:02, 22 March 2023 (UTC) struck & made my point clearer; I don't disagree with isaacl on this 09:31, 23 March 2023 (UTC)

I don't think there is anything inherently wrong in making up new rules as part of a policy discussion. I do think, though, as I mentioned previously, that avoiding doing so as much as possible is beneficial to getting people to understand and follow all the related guidance. A requirement to disclose the use of a type of tool would be a new requirement, and so some new guideline or policy would be needed. (I think disclosure is a more accurate term for what is being sought, versus attribution.) Everything else, as far as I can recall, is just reminding editors that they are responsible for the content changes they make, regardless of the tools used. isaacl (talk) 17:02, 22 March 2023 (UTC)

According to you, most of the current version is cruft. I don't understand how you came to that conclusion. I assume that you don't see the contents of your version as cruft. So what would you say to someone who claimed that most of the text in your version is cruft and replaced its content with the single sentence: "LLM usage must follow Wikipedia policy"? The better approach is probably to address the supposed problems one at a time (like a discussion of the G3) since other editors may not see them as problems.

At the risk of oversimplifying, I would say that the main purpose of this policy is twofold: make up new rules and clarify how existing rules apply. These issues may often be intertwined since the stipulation of how existing rules apply may itself incorporate new ideas on how they are to be interpreted. But your characterization of telling people to behave and giving admins "policy ground" also works. In this regard, cruft would be something that does not contribute to these purposes. Phlsph7 (talk) 08:37, 23 March 2023 (UTC)

When I wrote the initial version of the draft, this was the general thrust. The situation, at that time, was that a number of had been demanding a policy be written because there were so many forms of hypothetical abuse. I had attempted to explain that all of these ("what if someone writes a bunch of unsourced garbage? what if someone writes 'peepee poopoo' on wikipedia?") were already extremely against the rules to do. In fact, the first revision of this page has the whole policy in two sentences: "Editors who use the output of large language models (LLMs) as an aid in their editing are subject to the policies of Wikipedia. It is a violation of these policies to not follow these policies." I eventually added some more stuff (and later so did other people) because feedback was strongly in favor of more specific items of explanation and interpretation (i.e. explicitly saying that you are not allowed to use LLMs to create hoaxes, even though this is an extremely obvious core foundational policy of the project). jp×g 10:12, 23 March 2023 (UTC)

As I said in the essay I mentioned earlier, there are some editors who think editors shouldn't do X, therefore, we need a rule saying editors shouldn't do X. This approach, though, just exacerbates the problem of editors thinking that contributing to Wikipedia is overly complex. When discussing a new situation, we ought to isolate discussion as much as possible to what's new, and defer everything else to the general policies that cover everything. It would also be better not to label any new way to implement an existing policy as a new policy. It might be a new procedure, or just a new explanation to put the new scenario into context. isaacl (talk) 17:44, 23 March 2023 (UTC)

We could split the trimmed stuff to an explanatory essay. IMO, that's the proper place to clarify how existing rules apply (quoting Phlsph7), not in a policy. DFlhb (talk) 17:56, 23 March 2023 (UTC)

If you want to reduce the policy to the bare minimum, we probably just need one sentence covering attribution. But this minimalist approach to policy writing is not reflected in other policies. For example, have a look at Wikipedia:Deletion_policy#Alternatives_to_deletion or Wikipedia:Deletion_policy#Other_issues. Or if you look at Wikipedia:Copyrights, most of it is an explanation of how copyright in general works and not any specific new rules that apply only to Wikipedia.

Our policy is to be used in certain ways. How much detail we provide depends on what usage we expect and what dangers we try to avoid. We have to make sure that there is a broad consensus for the general points. The policy should not become too long or include excessive details that are almost never practically relevant. Phlsph7 (talk) 09:58, 24 March 2023 (UTC)

If there is consensus that we should go for a minimalist approach then I would suggest that we find another policy to add a sentence or two instead of creating a new one. Phlsph7 (talk) 10:10, 24 March 2023 (UTC)

I support DFlhb's trim. There are compelling reasons to keep P&G as concise as possible, and I agree that the current version has a lot of advice which is redundant/obvious in light of existing policy ("Do not, under any circumstances, use LLMs to generate hoaxes or disinformation"? No kidding). Colin M (talk) 17:39, 22 March 2023 (UTC)

I think the version prior to this edit was good, and I think the version after this edit is good too. I don't know if I like it or not. Per my comment a couple paragraphs up, the initial version of this draft was extremely concise and to-the-point. Since then, it became much larger, as people added more and more stuff (and as it cross-pollinated itself with a couple other drafts and how-to pages and essays and etc). So I think bringing it back to an extremely brief few paragraphs might be good. On the other hand, over the last few months, what was initially a giant pile of disjunct sections did get sorted out into a pretty nice and comprehensive explanation of the role of LLMs in editing, which I think dovetailed rather nicely with what limited consensus exists. So I guess it's hard to say whether it is good or bad... jp×g 10:12, 23 March 2023 (UTC)

If I may: this draft's Flesch reading ease score is 41 (corresponding to college undergrad level); it is harder to read than 70% of Wikipedia articles. For comparison, Fast inverse square root's score is 56 (high school level), harder than 33% of Wikipedia articles. The WP:CREEP is already off the charts, and it doesn't even have the excuse of being a decade-old policy (hence my attempts at WP:TNT to return to reason). DFlhb (talk) 19:37, 30 March 2023 (UTC)
- Maybe it's because of the See also section. There's also heavy novel jargon. We didn't even have an LLM article until recently. This draft appeared before the article was written. That meter probably decimates reading ease score each time a term such as "LLM" is encountered. —Alalch E. 19:59, 30 March 2023 (UTC)
  - Do keep in mind the algorithm only takes into account sentence length and syllable count; it also only processes full sentences, and ignores headings, markup, and sections like "See also". And finally, it doesn't take into account word familiarity or novelty. LLM has zero impact, and large language model has low impact. So I do think readability deserves some improvements. DFlhb (talk) 03:36, 1 April 2023 (UTC)
    - Okay, shorter sentences then. —Alalch E. 13:28, 1 April 2023 (UTC)

@DFlhb: Upon further reflection... well, honestly, upon a big-ass thread at WP:ANI, where people are desperately flailing around to figure out what to do with LLM content in mainspace. And upon this being the third or fourth time this has happened since December, I am somewhat inclined to come around to your point of view here. We need to have some policy about this, which means we should have an RfC, which means we should cut this down to an absolute minimum. I made Wikipedia:Large language model guidelines as a redirect a while ago. Here is my proposal: we split this into WP:LLM, a short and concise page per your edits earlier, and all the stuff currently here goes to WP:LLMG, which is a large beautiful page that comprehensively goes over everything in nice detail. Then we start an RfC here for a minimum viable policy, and potentially start one there for a guideline. What do you say? @DFlhb and Alalch E.: jp×g 02:01, 4 April 2023 (UTC)

I like your proposal. Increasing the likelihood of this page passing an adoption RfC (or, failing that, being easy to adjust based on RfC feedback) was the biggest motivation behind my trim; I also thought it'd make the RfC "tidier"/more productive, because the debate could focus on the key points and not the details. A short policy + a longer guideline feels ideal. DFlhb (talk) 02:10, 4 April 2023 (UTC)

I like this idea as well. DFlhb, my underlying concern with the trim is that the longer advice sections should be kept somewhere, but it makes sense to make the actual policy as concise as possible. We also have Wikipedia:Using neural network language models on Wikipedia which does a good job of separating the policy from everything else.

Even a concise policy proposal runs the risk of derailing because editors don't support all of the points. Would it make sense to run community RfCs on individual questions like "Can LLMs be used at all?", "Should we allow LLMs only for copyediting etc?" and "When is attribution required?" and write the policy based on the responses? –dlthewave ☎ 02:52, 4 April 2023 (UTC)

(replacing my previous comment, which was far less coherent than I thought!) Support, but we should make sure to ask specific questions, so the RfCs don't go nowhere like that chaotic WP:VPP discussion. The ones you propose are good, but more complex ones (like "how should LLMs be attributed?") might benefit from being workshopped at WP:VPI first, so that specific options can emerge. DFlhb (talk) 00:12, 10 April 2023 (UTC)

I totally agree. —Alalch E. 03:08, 4 April 2023 (UTC)

(if you read my previous comment which I've now deleted, please read this too, otherwise please disregard this new comment): I was wrong. An RfC wouldn't be convoluted if we draft it well, and we do need an RfC on the big questions. We can't propose a 2,500 word policy for adoption, pray that it passes, and then hope it somehow has enough legitimacy to be enforceable. People will just wikilawyer and say: "there was a consensus to have a policy, but there wasn't a consensus that it should say X rather than Y!" Hence RfCs are indeed needed; I think people had the right idea there. DFlhb (talk) 02:39, 23 April 2023 (UTC)

judges of consensus

This is not a suggestion for improving the essay, just speculation. Closing per WP:FORUM. — The Hand That Feeds You:^Bite 11:50, 13 May 2023 (UTC)

The following discussion has been closed. Please do not modify it.

Hello. I only read the main page. I did not read the discussion of users here. If my discussion is repetitive, please archive it.

In the near future, AI will be able to analyze and reason to the extent of human intelligence and have the ability judges of consensus on Wikipedia. In this case, will AI consensus be acceptable? Will self-aware AI contributions be welcomed to Wikipedia?--Sunfyre (talk) 08:44, 7 May 2023 (UTC)

I suspect whether a self-aware AI is allowed to contribute to wikipedia will be quite far down the list of things to think about should genuine self-aware AI arise, behind questions to do with alignment and rights. Talpedia 15:33, 7 May 2023 (UTC)

I think we're getting a bit ahead of ourselves here. Let's focus on the current capabilities at hand (human editors using "chatbots" to write content and copy-pasting it into Wikipedia) rather than hypothetical future scenarios. –dlthewave ☎ 15:44, 7 May 2023 (UTC)

To be helpful in judging/summarizing human consensus in large discussions, it would need to be familiar with minute details of Wikipedia's policies and guidelines et cetera. Which may be possible, either right now or in the very near future, with fine-tuning and maybe that API they're working on. An RfC closer could conceivably consult such a tool maybe, and perhaps it could even point things out that were missed in a discussion entirely, but to rely on it alone to judge consensus would probably not be acceptable. Otherwise you seem to be suggesting some other things ("self aware AI" editors) that are crystal-ball. VintageVernacular (talk) 05:14, 13 May 2023 (UTC)